Diffusion Model-based FOD Restoration from High Distortion in dMRI

Fiber orientation distributions (FODs) is a popular model to represent the diffusion MRI (dMRI) data. However, imaging artifacts such as susceptibility-induced distortion in dMRI can cause signal loss and lead to the corrupted reconstruction of FODs, which prohibits successful fiber tracking and connectivity analysis in affected brain regions such as the brain stem. Generative models, such as the diffusion models, have been successfully applied in various image restoration tasks. However, their application on FOD images poses unique challenges since FODs are 4-dimensional data represented by spherical harmonics (SPHARM) with the 4-th dimension exhibiting order-related dependency. In this paper, we propose a novel diffusion model for FOD restoration that can recover the signal loss caused by distortion artifacts. We use volume-order encoding to enhance the ability of the diffusion model to generate individual FOD volumes at all SPHARM orders. Moreover, we add cross-attention features extracted across all SPHARM orders in generating every individual FOD volume to capture the order-related dependency across FOD volumes. We also condition the diffusion model with low-distortion FODs surrounding high-distortion areas to maintain the geometric coherence of the generated FODs. We trained and tested our model using data from the UK Biobank (n = 1315). On a test set with ground truth (n = 43), we demonstrate the high accuracy of the generated FODs in terms of root mean square errors of FOD volumes and angular errors of FOD peaks. We also apply our method to a test set with large distortion in the brain stem area (n = 1172) and demonstrate the efficacy of our method in restoring the FOD integrity and, hence, greatly improving tractography performance in affected brain regions.


Introduction
Fiber orientation distribution (FOD) is a popular representation [3,6] to model the configuration of fascicle trajectories based on high-resolution diffusion MRI (dMRI) and has been widely used in modern tractography techniques for the reconstruction of major fiber bundles [1].Severe distortion artifacts, however, can prohibit the computation of valid FODs due to dMRI signal loss.While various distortion correction methods [23,18,22] were proposed to alleviate this problem, severe residual distortions are still widely present and pose a significant challenge to the successful reconstruction of fiber trajectories in regions with strong artifacts [2](Fig .1).Building upon recent successes of diffusion models [9] in various image restoration and synthesis tasks [16], we propose in this work a novel diffusion model-based method for the generative recovery of FODs in brain regions with severe signal loss.
For the generative restoration [25,26,7,15] and synthesis [30,31] of natural images, denoising diffusion probabilistic models (DDPM) [9,16] have achieved great success because they were shown to be more stable in training and have more controllable generation procedures than other generative models such as GANs [29] and VAEs [33].The application of diffusion models for FOD restoration, however, poses unique challenges.First, FODs are typically represented by spherical harmonics (SPHARM) up to a maximum order L [6].The FODs are thus 4-dimensional images because they consist of multiple 3-D volumes at each SPHARM order.This would require large GPU memory and an enormous number of parameters if we generate all FOD volumes together.Second, while it is memory-efficient to treat all FOD volumes equally in training a generative DDPM model, the order-dependency of the FOD volumes can make it hard to ensure the validity of the generated FODs as shown in Fig 2(a) and (b).Third, the generated FODs need to maintain geometric coherence with surrounded voxels with low distortion and more reliable representation.
In this work, we will develop a novel diffusion model-based method for the recovery of FODs in brain regions with high distortions, namely the FOD-Diffusion model.We develop a volume order-aware diffusion model to make the FOD-Diffusion model suitable for generating FOD volumes in both high-and lowfrequency orders.We also provide a frequency order-balance cross-attention for extracting related information from all FOD volumes and helping the generation of individual FOD volumes.Experiments on the large-scale UK Biobank dataset (n = 1315) [24,32] will demonstrate that our method successfully restores FODs from dMRI signal loss due to large residual distortion artifacts and hence greatly improved fiber tracking through affected brain stem regions.

Method
A detailed illustration of the proposed FOD-Diffusion model is shown in Fig. 3.We will explain each aspect of the model in detail in the following sections.

Order-aware Diffusion Model
We propose a volume order-aware diffusion model to enable the diffusion model to generate FODs across all different frequency orders.Firstly, inspired by the time encoding in the diffusion model [9,8], we propose a volume and frequency order-aware encoding (referred to as volume encoding) to enhance the model's ability to distinguish FODs of different frequencies and orientations.The volume encoding encodes an array [L, V ], where L and V are the frequency order number and the volume number of the single FOD volume, respectively.
Moreover, we also use the low-signal loss regions as a condition to provide the model with sufficient constraints during generation.In the low-signal loss data, we assign high-signal loss regions to number 1 to distinguish them from the background.
We use the L 1 loss during training, and enhance the loss of the high-distortion regions, as shown in Eq. 1: where x and x represent the generated FODs and the ground truth of the FODs, respectively.The indicator function 1(x) = x is applied within the mask of the high-signal-loss regions, while 1(x) = 0 is applied in other voxels.

Frequency-balanced Cross-attention
The FODs represented by SPHARM have interdependence between different volumes, and we model the relationship between different FOD volumes to help the generation.We propose a frequency-balanced cross-attention method that extracts features from all FOD volumes and feeds them to the U-Net in the diffusion model through the cross-attention [5].This method achieves balanced attention across each frequency order.
As shown in Fig. 3, for FOD volumes in each frequency order, we calculate their average to reduce the volume number to one in each frequency order.Specifically, for order L = 0, as it only has one volume, it can be directly used.Then, we use copied U-Nets from the diffusion model with time encoding t = 0 and volume encoding V to extract features from the averaged FOD volumes in each frequency order.These features correspond to the output of the selfattention of each decoding block.Afterward, a frequency-order aware convolution with kernel size 1 is performed to select and combine the information that is most relevant to the FOD volume that needs to be generated.We use copied U-Nets from the diffusion model's U-Net to extract crossattention features.The parameters in copied U-Nets are frozen during the training.We copy the parameters to the copied U-Net when we update the U-Net in the diffusion model.This approach reduces the number of parameters that need to be trained; hence, it reduces the training time and prevents overfitting.
More detail of the frequency order-aware convolution is shown in Fig. 4. Firstly, we use sine and cosine functions to embed the order number.Afterward, a multilayer perceptron (MLP) layer is added to calculate the encoding weights [e 1 , e 2 , ..., e n ] T .Eq. 2 is used to encode the features F from each frequency order by adding the encoding weights: where m = 0, 2, ..., 8 is the frequency order, and J denotes the all-ones matrix.
The combined features from the 1×1 Conv are fed into the cross-attention through the Key and the Value in the U-Net of the diffusion model, and the Query is the feature from the single volume.The most useful information from other FOD volumes is selected from the Value to enhance the restoration of the current FOD volume under consideration [5].

Dataset and Preprocessing
Our FOD-Diffusion method was trained, validated, and tested on the UK Biobank dataset.Overall, this study utilized n = 1315 data points.They were extracted from the dataset randomly, with ages ranging from 30 years old to 80 years old.
All extracted data were pre-processed using Topup [18] and Eddy [20,21] for correcting distortion, eddy current, and head motion.Then we calculated the FODs with the highest order L = 8 using the method in Ref. [6].To extract low-residual distortion cases for model training, we used the method in Ref. [4] Fig. 5. Examples of the ground truth FODs and the FODs of our FOD-Diffusion model from the test set.Especially, (a) is the same case in Fig. 2 to calculate the severity map of residual distortion, and we extracted 143 data that have both low mean signal loss (< 0.25) and high mean FOD integrity (> 0.09), which contains about 10% of the data, for model training, validating and testing.We used n = 90 data for training, n = 10 data for validation, and n = 43 data for testing.We masked out regions in the brainstem of these data to act as the high-signal loss regions.The remaining n 1172 data that have high residual distortion formed a large test set to test the performance of the model in real data with high distortion.The masks of high-distortion regions were extracted based on the residual distortion severity map.

Model Training and Validation
The model was trained for 100,000 iterations with the batch size of 8. We used AdamW optimizer in the training, with a learning rate of 10 −5 initially and 10 −6 after 70,000 iterations.We validated the model in every epoch, and the model with the least validation loss was selected.The training took about 68.5 hours on an NVIDIA RTX A5000 GPU, with about 22 GB of GPU memory.We used the DDPM for 1000 time steps in the inference, and the v-prediction [27] was used in all experiments.The inference time for a single FOD volume was about 90 seconds.For the similarity of intensity, although some FODs have different intensities with the ground truth, the intensity is similar for most FODs.

Ablation Studies
We then quantitatively compared our model with the unconditional DDPM and two ablation studies.The first ablation only takes the low-signal loss FOD volume (named vol) as the condition, and the second one also has the volume encoding (named enc).Table 1 shows the mean squared error of brain stem results from different methods.The unconditional diffusion model failed in FODs' generation.Results of the ablation study show that all the improvements we added helped to improve the accuracy of the generated FODs.
We also compared the geometric differences between different methods, following the work in Ref. [22].We first calculated the angles of the FOD peaks with the top three largest amplitudes that have peak values larger than 0.5 from both the ground truth FODs and the results FODs.Afterward, we calculated the corresponding smallest angular differences for the highest and second highest peak from the ground truth FODs (called "1 st peak" and "2 nd peak") to the peaks of the result FODs.The results are shown in Table 2.All ablation studies significantly overcome the unconditional diffusion model, and our FOD-Diffusion model has less angular differences and standard derivations than other models in the ablation studies.

Performance on Data with High Distortion
Because we do not have ground truth for high distortion data, we will use the group-wise distribution on the FODs' integrity at the pons regions to evaluate the FOD-Diffusion model in the high signal loss UK Biobank data (n = 1172).
Here we compare our FOD-Diffusion method with the Topup method, as it is a representative registration-based method and the most widely used method in connectome research.
Fig. 6 shows the distribution of the FODs' integrity at the pons region before and after the signal loss recovery.The 1172 data are evenly divided into five groups based on the severity of signal loss, named groups of "very slight", "slight", "medium", "severe", and "very severe" residual distortion, respectively.Each of the first four groups has 236 subjects, and the last group has 228 subjects.The mean intensity of the L = 0 component at the pons ROIs (the atlas is defined in Ref. [28]) in Fig. 6(a) was calculated and used as a measure of FOD integrity in the evaluation following the work in Ref. [4].This evaluation method is efficient because the intensity of the L = 0 component represents the mean FOD value [6], and the mean FOD decreases with the increase of the residual distortion.the increase of the severity in signal loss, the integrity of restored FODs have similar distributions across the groups.This shows that our method can recover signal loss for data with high residual distortions.
The trachography results for two high signal loss cases are shown in Fig. 7. Here, we calculated the corticospinal tract (CST) [11] and the middle cerebellar peduncle (MCP) [12] for both the Topup results and the results of our FOD-Diffusion method.These two fiber bundles share the same region of interest at the pons (the red mask in Fig. 6(a)) for generating the seeds.Fig. 7 shows that the tractography results of CST using FOD-Diffusion cover more pons region, and the MCP generated using FOD-Diffusion can cover the whole pons region.Therefore, our FOD-Diffusion method can successfully restore FOD-based fiber connectivity at the pons region.

Conclusion
In this work, we proposed a diffusion model-based method for FOD restoration to recover the signal loss caused by high residual distortions.For the generation of complex 4D FOD data, our model is memory efficient that can be trained on one GPU with 24GB GPU memory.We demonstrated the performance of our method in brainstem regions using data from UK Biobank.In data with ground truth (low distortion), we quantitatively validated the accuracy of our generated FODs.For data with high distortion, we demonstrated the restoration of FOD integrity and the potential of restored FODs in helping fiber tracking of important brainstem pathways.
Future work will include testing our method in more datasets from clinical studies, such as Alzheimer's Disease Neuroimaging Initiative (ADNI) [35] and Health & Aging Brain among Latino Elders (HABLE) [34].We will also test our method in more complex brain regions with high residual distortions such as the temporal lobe.

Fig. 1 .
Fig. 1.The residual distortion affects FODs and tractography.(a) The Topup method from FSL [18] aligns the b = 0 images from 2 opposite phase encoding directions to correct the susceptibility-induced distortion.(b) Corrupted FODs and failed tractography of the data in (a) illustrate the impact of the signal loss that cannot be recovered by distortion correction.(c) FODs from a low signal loss case and the successful fiber tracking results.

Fig. 2 .
Fig. 2. The unconditioned DDPM model that only generates one FOD volume at each time failed in restoring the FODs.(a) The FODs have different numbers of volumes for each order.(b) Errors in individual volumes lead to erroneous FOD representations.

Fig. 3 .
Fig. 3.The framework of the FOD-Diffusion model.Our FOD-Diffusion model takes the low-signal loss FODs as a condition in generation and uses the volume and order number encoding (denoted as V) to generate FODs in different frequency order.We also use low-signal loss FODs from all FOD volumes to extract the cross-attention information and help the generation of each FOD volume.

Fig. 4 .
Fig.4.Details of the frequency order-aware cross-attention calculation.We use the frequency order encoding to adjust the features from each order, and then we use a convolution layer to combine and select the features.

Fig. 5
Fig. 5 shows two examples of the FODs for the ground truth and the results of our FOD-Diffusion model.These examples are from the test set.FODs generated by our FOD-diffusion model have high angular similarity with the ground truth.For the similarity of intensity, although some FODs have different intensities with the ground truth, the intensity is similar for most FODs.We then quantitatively compared our model with the unconditional DDPM and two ablation studies.The first ablation only takes the low-signal loss FOD volume (named vol) as the condition, and the second one also has the volume encoding (named enc).Table1shows the mean squared error of brain stem results from different methods.The unconditional diffusion model failed in FODs' generation.Results of the ablation study show that all the improvements we added helped to improve the accuracy of the generated FODs.

Fig. 6 (
b) shows that while the FOD integrity of the original FODs reduces with

Fig. 6 .
Fig. 6.Distributions of FODs' integrity at the pons region before and after signal loss recovery for high residual distortion data (n = 1172).(a) shows the position of the pons' masks, (b) shows the distribution of FODs' integrity.

Fig. 7 .
Fig. 7. Tractography of CST and MCP for 2 high signal loss cases.(a) shows the FODs of the Topup and FOD-Diffusion methods.(b) and (c) are the tractography results of data1 and data2, respectively."Data1" is the same data in Fig. 1(b).

Table 1 .
Root Mean Squared Errors of FODs in Test Set.

Table 2 .
Angular Differences for FODs in Test Set.